Ranking Text Units According to Textual Saliency, Connectivity and Topic Aptness

نویسنده

  • Antonio Sanfilippo
چکیده

An efficient use of lexical cohesion is described for ranking text units according to their contribution in defining the meaning of a text (textual saliency), their ability to form a cohesive sub-text (textual connectivity) and the extent and effectiveness to which they address the different topics which characterize the subject matter of the text (topic aptness). A specific application is also discussed where the method described is employed to build the indexing component of a summarization system to provide both generic and query-based indicative summaries. 1 Introduction As information systems become a more integral part of personal computing, it appears clear that summarization technology must be able to address users' needs effectively if it is to meet the demands of a growing market in the area of document management. MinimaUy, the abridgement of a text according to a user's needs involves selecting the most salient por-tiolts of the text which are topically best suited to represent the user's interests. This selection must also take into consideration the degree of connectivity among the chosen text portions so as to minimize the danger of producing summaries which contain poorly linked sentences. In addition, the assessment of textual saliency, connectivity and topic aptness must be computed efficiently enough so that summa-dens, Ian Johnson and Victor Poznaliski for helpful comments on previous versions of this document. Many thanks also to Stephen Burns for internet programming support, tlalf St.einberger for assistance in dictionary conversion, and Charlotte Boynton for editorial help. rization can be conveniently performed on-line. The goal of this paper is to show how these objectives can be achieved through a conceptual indexing technique based on an efficient use of lexical cohesion. 2 Background Lexical cohesion has been widely used in text analysis for the cornparative assessment of saliency and connectivity of text fragments. Following Ho W (1991), a simple way of computing lexical cohesion ill a text is to segment tile text into units (e.g sentences) and to count non-stop words I which co-occur in each pair of distinct text units, as shown in Table 2 for the text in Table 1. Text units which contain a greater number of shared non-stop words are more likely to provide a better abridgement of the original text for two reasons: • the more often a word with high informa-tional content occurs in a text, the more topical and germane to the text the word is …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph-based Visual Saliency Model using Background Color

Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map i...

متن کامل

TEXTUAL AND INTER-TEXTUAL ANALYSES OF IRANIAN EFL UNDERGRADUATES’ TYPES OF ENGLISH READING TOWARDS DEVELOPING A CAREFUL READING FRAMEWORK

This study investigated textual and inter-textual reading of a group of Iranian EFL undergraduates’ careful English reading types. In this research, Khalifa and Weir’s (2009) reading framework was used to propose a more inclusive aspect of a careful reading framework and the reading construct for instructional and assessment goals. The participants of this study were B.A. students of English Tr...

متن کامل

Text Summarization through Entailment-based Minimum Vertex Cover

Sentence Connectivity is a textual characteristic that may be incorporated intelligently for the selection of sentences of a well meaning summary. However, the existing summarization methods do not utilize its potential fully. The present paper introduces a novel method for singledocument text summarization. It poses the text summarization task as an optimization problem, and attempts to solve ...

متن کامل

MCU at NTCIR: Chinese Fact Validation via SVM Cotext Ranking

Validate factoid description in text is the subtask of finding the textual entailment relation between the given hypothesis and unlabeled raw corpus. By means of integrating multiple natural language processing units, higher performance could be reasonably achieved. In this paper, we propose a context ranking model-based and trainable framework under the condition of partof-speech tagging infor...

متن کامل

Exploring the Relationship Between Modality and Readability Across Different Text Types

With regard to the relationship between the use of modality and readability levels oftexts, 2 opposing views have been raised. The first view endorses direct positiverelationship between modality and readability in the sense that the use of modalityincreases textual understandability. The second view is that the use of modality leadsto an increase in the number of words, resulting in readabilit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998